Skip to content

Harden guest exit, reset, and disk I/O paths#52

Merged
jserv merged 1 commit intomasterfrom
harden
Apr 29, 2026
Merged

Harden guest exit, reset, and disk I/O paths#52
jserv merged 1 commit intomasterfrom
harden

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented Apr 29, 2026

  • Switch the disk-image backend to pread/pwrite so a future second virtio-blk virtqueue cannot race on the shared file pointer, and check the fstat return on open so initialization fails loudly on a kernel error instead of silently reading st.st_size from uninitialized stack.

  • Defer the terminal raw-mode switch until guest setup succeeds, so any error during image load or device init renders on a normal tty instead of a half-cooked terminal.

  • Implement virtio-pci device reset. The previous no-op left acked features and the ISR live across re-probe, so a guest reload would observe stale negotiation state. The new reset clears acked features, the ISR, the common-cfg selectors, and the per-virtq indices for any queue that was never enabled. Enabled queues are deliberately left alone because the per-device worker threads poll the descriptor ring without a lock; a full tear-down needs a per-device reset hook that does not exist yet, and clearing state underneath a running worker is worse than leaving it stale.

  • Handle KVM_EXIT_SYSTEM_EVENT in the run loop. SHUTDOWN and RESET are clean exits — a guest panic with panic=-1 reaches us as RESET, which is indistinguishable from a userspace reboot and matches the x86 reboot=k path that comes back as KVM_EXIT_SHUTDOWN. CRASH propagates as -1 so kdump and NMI watchdog signals reach the host exit code.

  • Append panic=-1 (plus reboot=k on x86) to the guest kernel cmdline so a guest panic terminates the VM instead of hanging in panic() until Ctrl-A x. Also document the IRQ map: serial, blk, and net have always been distinct lines, but a stale FIXME suggested otherwise.

  • On arm64, enable in-kernel PSCI 0.2 emulation (KVM_ARM_VCPU_PSCI_0_2 in vcpu features) and advertise it via a device-tree node with method set to hvc. Without this, SYSTEM_RESET issued by panic=-1 is either ignored — leaving the guest spinning — or trapped as undef instead of surfacing to the host loop.


Summary by cubic

Hardened VM shutdown/reset, virtio queue bounds, and disk I/O. Adds proper virtio-pci reset, host-side queue count validation, clean KVM system-event exits, and panic-to-exit by default; disk I/O now uses pread/pwrite and setup errors render on a normal TTY.

  • New Features

    • virtio-pci reset + queue bounds hardening: clears acked features, ISR, selectors, and non-enabled virtq indices; preserves enabled queues; bounds checks use a host num_queues cache to block guest OOB writes.
    • Handle KVM system events: SHUTDOWN/RESET exit 0; CRASH exits -1.
    • Panic-to-exit by default: add "panic=-1" (and "reboot=k" on x86); arm64 enables in-kernel PSCI 0.2 and exposes a /psci FDT node (method "hvc").
  • Bug Fixes

    • Disk I/O uses pread/pwrite to avoid shared file-pointer races; fstat is checked and open fails cleanly on error.
    • Defer switching the terminal to raw mode until after guest setup to keep error output readable.

Written for commit 59530e7. Summary will update on new commits. Review in cubic

cubic-dev-ai[bot]

This comment was marked as resolved.

- Switch the disk-image backend to pread/pwrite so a future second
  virtio-blk virtqueue cannot race on the shared file pointer, and check
  the fstat return on open so initialization fails loudly on a kernel
  error instead of silently reading st.st_size from uninitialized stack.

- Defer the terminal raw-mode switch until guest setup succeeds, so any
  error during image load or device init renders on a normal tty
  instead of a half-cooked terminal.

- Implement virtio-pci device reset. The previous no-op left acked
  features and the ISR live across re-probe, so a guest reload would
  observe stale negotiation state. The new reset clears acked features,
  the ISR, the common-cfg selectors, and the per-virtq indices for any
  queue that was never enabled. Enabled queues are deliberately left
  alone because the per-device worker threads poll the descriptor ring
  without a lock; a full tear-down needs a per-device reset hook that
  does not exist yet, and clearing state underneath a running worker is
  worse than leaving it stale.

  All bounds checks against the queue count now go through a host-side
  num_queues cache rather than the guest-writable common-cfg field. The
  unconditional memcpy in virtio_pci_space_write lets a guest overwrite
  common_cfg.num_queues with any 16-bit value, which had turned the
  reset loop and the queue-select / queue-config write paths into
  OOB-write primitives against vq[]. The cache is set once during
  virtio_pci_set_virtq and is the single source of truth for indexing
  vq[]; the guest mirror remains for the spec-mandated read view.

- Handle KVM_EXIT_SYSTEM_EVENT in the run loop. SHUTDOWN and RESET are
  clean exits — a guest panic with panic=-1 reaches us as RESET, which
  is indistinguishable from a userspace reboot and matches the x86
  reboot=k path that comes back as KVM_EXIT_SHUTDOWN. CRASH propagates
  as -1 so kdump and NMI watchdog signals reach the host exit code.

- Append panic=-1 (plus reboot=k on x86) to the guest kernel cmdline so
  a guest panic terminates the VM instead of hanging in panic() until
  Ctrl-A x. Also document the IRQ map: serial, blk, and net have always
  been distinct lines, but a stale FIXME suggested otherwise.

- On arm64, enable in-kernel PSCI 0.2 emulation
  (KVM_ARM_VCPU_PSCI_0_2 in vcpu features) and advertise it via a
  device-tree node with method set to hvc. Without this, the
  SYSTEM_RESET issued by panic=-1 is either ignored — leaving the guest
  spinning — or trapped as undef instead of surfacing to the host loop.
@jserv jserv merged commit 737d00d into master Apr 29, 2026
10 checks passed
@jserv jserv deleted the harden branch April 29, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant